(N 



00 



X: 



LP-rounding algorithms for facility-location problems 

Jaroslaw Byrka* MohainmadReza Ghodsi^ Aravind Srinivasan-'- 



March 9, 2012 



Abstract 



o 

(N 

^ . We study LP-rounding approximation algorithms for metric uncapacitatcd facility-focation problems. 

We first give a new analysis for the algorithm of Chudak and Shmoys, which differs from the analysis of 

Byrka and Aardal in that now we do not need any bound based on the solution to the dual LP program. 

Besides obtaining the optimal bifactor approximation as do Byrka and Aardal, we can now also show 

that the algorithm with sealing parameter equaling 1.58 is, in fact, an 1.58-approximation algorithm. 

More importantly, we suggest an approach based on additional randomization and analyses such as ours, 

^.^ . which could achieve or approach the conjectured optimal 1.46 • ■ —approximation for this basic problem. 

t-^ ■ Next, using essentially the same techniques, we obtain improved approximation algorithms in the 

C/2 , 2-stage stochastic variant of the problem, where we must open a subset of facilities having only stochas- 

^ ' tic information about the future demand from the clients. For this problem we obtain a 2.2975- 

approximation algorithm in the standard setting, and a 2.4957-approximation in the more restricted, 

OQ ■ per-scenario setting. 

^ , We then study robust fault-tolerant facility location, introduced by Chechik and Peleg: solutions 

here are designed to provide low connection cost in case of failure of up to k facilities. Chechik and 
vQ , Peleg gave a 6.5-approximation algorithm for fc = 1 and a (7.5fc -I- 1.5)-approximation algorithm for 

fT^ ■ general k. We improve this to an LP-rounding (A: -I- 5 -I- 4/A:)-approximation algorithm. We also observe 

that in case of oblivious failures the expected approximation ratio can be reduced to A: -I- 1.5, and that 
C^ ■ the integrality gap of the natural LP-relaxation of the problem is at least k + I. 
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1 Introduction 

In facility location problems, we seek a subset of given locations where to build facilities, in order to service 
a given set of clients. The goal is to minimize the total cost of constructing facilities and the clients' service 
cost, which in the metric setting is a function of distances between clients and the facilities that they are 
assigned to. In this paper we will only consider uncapacitated problems, where there is no restriction on 
the number of clients connected to a single facility. (We sometimes use the terminology of opening a subset 
of the existing facilities, rather then constructing them.) We present improved approximation algorithms 
for a variety of such problems using LP-rounding; we also sketch a possible approach to achieving or 
approaching the optimal approximation for the basic and perhaps most-studied variant, the Uncapacitated 
Facility Location problem (UFL). 

The UFL is defined as follows. Given a set T of facilities and a set C of clients, we aim to open a subset 
of facilities and connect every client to an open facility. The cost of opening facility i is /«, and the cost 
of connecting client j to facility i is Cij] the connection costs are assumed to define a symmetric metric. 
The goal is to choose the facilities and connections that minimize the sum of facility-opening costs and 
client-connection costs. 

The UFL problem is NP-hard and hard to approximate better than the positive solution sq ~ 1.46 to 
the equation s = 1 -|- 2e~'^ [6], where e denotes the base of the natural logarithm. Jain et al. [7] generalized 
this result to the following bifactor lower bound. Let Fqpt and Cqpt denote the facility-opening cost and 
client-connection cost of an optimal solution. They proved that it is unlikely, for any A > 1, that there 
exists a polynomial-time algorithm that finds a solution with facility-opening cost at most A ■ Fqpt and 
connection cost at most (1 -|- 2e~ ) • Cqpt- On the positive side, Shmoys, Tardos and Aardal [T0| provided 
a constant factor LP-rounding approximation algorithm that exploits the assumption on the connection 
costs being metric. A series of algorithms improving the approximation ratio then followed, borrowing 
and contributing to essentially all known major approaches in approximation algorithms. The currently 
best-known approximation ratio is reached by the 1.5-approximation algorithm of Byrka and Aardal [2]: 
this is obtained by a combination of a greedy algorithm analyzed by a dual fitting technique [9] and a novel 
analysis of the LP-rounding algorithm of Chudak and Shmoys [5] . 

The first result of this paper is to simplify the analysis of Byrka and Aardal. In our case, the expected 
connection cost of a single client gets bounded with respect to the fractional connection cost of the client, 
rather than w.r.t. a combination of its fractional connection and a dual budget as in [SI [2]. The main 
result of [2] remains unaffected: we still obtain that the expected cost of the solution is at most 7 • Fqpt + 
(1 -|- 2e~^') ■ Cqpt if the scaling parameter is 7 > 1.678. However, our analysis is purely primal-based, and 
thus works if we have an approximately-optimal LP solution as well. (That is, our approximation bounds 
will be scaled by (1 -|- e) if we have an (1 -|- e)-approximate solution to the LP - obtained, for instance, 
by some fast algorithm.) Furthermore, for smaller values of the parameter 7, we obtain bounds that are 
stronger than in [2]. In particular, for 7 = 1.575 we obtain 1.575-approximate solutions which was not 
known before. Interestingly, the same ratio was previously obtained by yet another analysis, namely an 
analysis of Sviridenko [12] . who considered essentially the same algorithm, but with the scaling parameter 
7 drawn randomly from a certain nontrivial distribution. Perhaps most importantly, we suggest a new 
type of approach for the UFL based on our analysis, which appears promising in terms of approaching the 
optimal approximation of sq ~ 1.46 • • •. 

Next we consider the setting of uncertain, stochastic demand modeled as a 2-stage stochastic optimiza- 
tion problem. In the first stage, given stochastic information about the set of clients that needs to be served 
we decide to open a subset of facilities. Next, in the second stage, the actual set of clients is revealed to 
us and we can open additional facilities. Finally we connect each client to a facility opened in any of the 
stages. The essence of the problem is that facility-opening costs change over time, i.e., it is cheaper to open 
a facility earlier. We make the standard assumption that the stochastic demand is presented to us in the 
form of a polynomial number of possible scenarios, each scenario to be realized with a certain probability. 



The goal is to minimize the total expected cost. Certain algorithms deliver slightly stronger, per-scenario 
bounds, i.e., the cost in each scenario is compared to the fractional cost in this scenario. The 2-stage 
stochastic facility location was introduced by Swamy and Shmoys [13]. The approximation ratio was then 
improved by Srinivasan [TT], who obtained a 2.369-approximation in the general expectation setting and 
a 3.095-approximation in the per-scenario model. We use the techniques we develop for UFL to improve 
these ratios to 2.2975 and 2.4957 respectively. 

Finaly, we consider the Robust Fault-Tolerant Facility Location (RFTFL) problem introduced recently 
by Chechik and Peleg [4], and apply some insights from stochastic facility location. In RFTFL, one has 
to choose a set of facilities that are in a sense robust: i.e., in case of failure of up to k of the opened 
facilities, where k is viewed as a constant, the cost of connecting clients to the facilities that did not fail 
should be small. More precisely, we bound the total facility-opening cost plus a worst case client-connection 
cost. We start by observing that this problem can be modeled by an IP similar to the one used for the 
2-stage stochastic problem. Now we say that facilities are opened only in the first stage and there are ('^') 
scenarios, each of them excluding the use of a certain subset of k facilities. We present an LP-rounding 
(A; + 5 + 4//c)-approximation algorithm, which improves (for k > 1) upon the bound of 7.5/c + 1.5 from [4]. 
We also show that if the scenario is chosen by an oblivious adversary, the bound can be improved to fc + 1.5. 
Finally, we show a natural limit of this LP-rounding method by constructing simple instances for which 
the integrality gap of such an LP-relaxation of the problem is at least k + 1. 

2 Uncapacitated Facility Location problem 

We start with a sketch of the algorithm, and then discuss the crucial steps in more detail. Following this, 
we sketch our ideas for approaching the optimal sq ~ 1.46 ■ ■ --approximation. 

By CS{'y), we denote the algorithm of Chudak and Shmoys [5] with the scaling parameter equaling 7. 
A sketch of the CS{'~f) algorithm is as follows: 

1. Solve the standard LP-relaxation (see below) of UFL. 

2. Modify the fractional solution by: 

• scaling up the facility-opening variables by 7, 

• modifying the connection variables to completely use the "closest" fractionally open facilities, 

• splitting facilities, if necessary, such that there is no slack between the amount that a client is 
assigned to a facility, and the amount by which this facility is opened. 

3. Divide clients into clusters based on the current fractional solution. In each cluster a specific client 
is assigned to be a "cluster center". {This is a key step.) 

4. For every cluster, open one of the "nearby" facilities of the cluster center. 

5. For each facility not considered above, open it independently with probability equal to its (scaled) 
fractional opening value. 

6. Connect each client to an open facility that is closest to it. 

IP formulation and relaxation. UFL has a natural formulation as the following integer program. 

'EieTXij = l foraUjeC, 

Xij — 2/i < for all i £ T,j G C, 

Xij,yi G {0, 1} for ah i € 7", j G C . (1) 



A linear relaxation of this IP formulation is obtained by replacing the integrality constraints ([T]) by the 
constraint Xij > 0, yj > for all i & J-',j & C. We use this LP relaxation as a lower bound for the cost of 
the optimal integral solution. 

Scaling and clustering. Inspired by a filtering technique of Lin and Vitter, the following scaling pro- 
cedure has been successfully applied to facility location problems. Suppose that we have solved the LP 
relaxation, and that the optimal Q| primal solution is {x*,y*). We will start by modifying {x*,y*) by scaling 
the y- variables by a constant 7 > 1 to obtain a fractional solution (x*,y), where y = 7 • y*. Note that 
by scaling we might set some jji > 1. In the filtering of Shmoys et al. such a variable would instantly be 
rounded to 1. However, for the compactness of a later part of our analysis it is useful not to round these 
variables, but rather to split facilities. 

Before we discuss splitting, let us first modify the connection variables. Suppose that the values of the 
y- variables are scaled and fixed, but that we now have the freedom to change the values of the x-variables 
in order to minimize the connection cost. For each client j we compute the values of the corresponding 
x-variables in the following way. We choose an ordering of facilities with non-decreasing distances to client 
j. We connect client j to the first facilities in the ordering so that among the facilities fractionally serving 
j, only the last one can be opened by more than that it serves j (i.e., for any facilities i and i' such that 
i' is later in the ordering, if Xij < jji then Xj/j = 0). In the next step, we eliminate the occurrences of 
situations where < Xij < jji. We do so by creating an equivalent instance of the UFL problem, where 
facility i is split into two identical facilities i' and i" . In the new setting, the opening of facility i' is Xij and 
the opening of facility i" is yi — Xij. The values of the x- variables are updated accordingly. By repeatedly 
applying this procedure we obtain a so-called complete solution (x,y), i.e., a solution in which no pair 
i G T,j € C exists such that < Xij < y^ (see [12] [Lemma 1] for a more detailed argument). 

Based on the complete fractional solution (x,y), some of the facilities in our instance of UFL are 
grouped into clusters. (It is sometimes intuitive to view clusters as sets of clients instead.) Each cluster of 
facilities is created by picking a client as cluster center and creating a cluster from the facilities serving it 
in (x, y). We require that no facility belong to more than one cluster. Therefore when a cluster center is 
picked, any client that shares a facility with the cluster center can no longer be picked as the center of a 
new cluster. The algorithm of [5] uses the following procedure to obtain the clustering: while not all the 
clients are processed, greedily choose (in the manner described next) a new cluster center j, and build a 
cluster from j and facilities serving j in {x,y). Remove j and any client that shares a facility with j. The 
greedy choice of the next cluster center depends on the distences between the clients and facilities serving 
them. For each client j, compute: (i) dj, the average distance from j to facilities serving it in (x, y), and 
(ii) d, , the maximum distance to any facility serving it in (x, y). (For more formal definitions see the 

analysis of the algorithm.) The next cluster center is the remaining client with smallest d^'^ -\- dy^"-^' _ 

To obtain the integral solution, we round the fractional variables as follows. For each cluster the 
complete fractional opening variables y^ sum to 1. We open exactly one facility within each cluster, with 
probabilities equal to the y^s. Any facility that is not in a cluster is opened independently with probability 
yj. Each client is connected to the closest open facility. 

2.1 Analysis 

We will use the average distances between single clients and groups of facilities defined as follows. For any 
client j E C, and for any subset of facilities T' <Z T such that Yl,i<^T'yi > 0, let d{j,T') = y ^-^' 'i ' ' . We 
will call the set of facilities i € J-" such that Xij > the set of close facilities of client j and we denote it 



^Our algorithm is entirely primal and therefore may start with an arbitrary feasible fractional solution, we only start with 
an optimal fractional solution to be sure that its cost is no more then the cost of the optimal integral one. 
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Figure 1: Distances to facilities serving client j in {x*,y*). The width of a rectangle corresponding to 
facility i is equal to 7 • x* • = y^ . This figure helps us understand the meaning of pj . 



by Cj . By analogy, we will call the set of facilities i £ T such that x* . > and Xij = the set of distant 
facilities of client j and denote it Dj. Observe that Cj (1 Dj = ij) for each client j. 

We are interested in average distances from a client j to sets of facilities fractionally serving it. Let dj 
be the average connection cost in x*j defined as 
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Let dj , dj be the average distances to close and distant facilities defined as 
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maxjg(7. Cij be the maximum distance of j to its close facilities, as mentioned above. 
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if dj > 0, and define pj = otherwise. Observe that pj takes value 
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Let Pj be defined as pj 

between and 1. pj = implies d 

the paprameterpj are desirable for one part of the analysis, and small values are good for another part. 

Lemma 2.1 df^ = dj{l + -^). 



Observe that dy^""^' < d, . We will also use the following lemmas from [2]: 



Lemma 2.2 Suppose 7 < 2 and that clients j,j' € C share a facility in {x,y), i.e., 3i G J^ s.t. Xij > 
and Xiji > 0. Then, either Cji \ {Cj U Dj) = or 

d{j, C,, \ {Cj U D,)) < df + 47^^"^ + df. 

Lemma 2.3 Given are a random vector y G {0, 1}' ' produced by CS Algorithm CS{'y), a subset A C J- 
of facilities such that J^i^AVi > 0? ^'^'^ ^ client j € C. Then, the following holds: 
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In the analysis of our algorithm we will also use the following result: 
Lemma 2.4 Suppose we are given n independent events that occur with probabilities pi,p2, ■ ■ ■ ,Pn respec- 

En 

Let 70 be defined as the only positive solution to the following equation. 



An approximate value of this constant is 70 ~ 1.67736. As we will observe in the proof of Theorem 12.51 
equation ([2|) appears naturally in the analysis of algorithm CS{'y). 

Theorem 2.5 For 1 < 7 < 2, Algorithm CS{'y) produces a solution with expected costs 

E[cost{OPENi)] = j-F*, 

E[cost{CONNj)] < max i 1 + 26"^, - — ^^ I • C* 

where F* = f,y*, C* = E^e^Ci,x*^, F* = E^eT F* and C* = E.ecC*. 

Proof: The expected facility-opening cost is E[cost{OPENi)] = fiy^ = 'jfiyf = 7 • F/. 

To bound the expected connection cost, we show that for each client j there is an open facility within 
a certain distance with a certain probability. If j is a cluster center, one of its close facilities is open and 

(c) 

the expected distance to this open facility is d- . 

If j is not a cluster center, it first considers its close facilities (see Figure [2]). If any of them is open, 
by Lemma I2. 31 the expected distance to the closest open facility is at most d^ . From Lemma l2.4| at least 
one close facility is open with probability Pc > (1 — ^)- Suppose none of the close facilities of j is open, 
but at least one of its distant facilities is open. Let pd denote the probability of this event. Again by 
Lemma 12.31 the expected distance to the closest facility is then at most d, . If neither any close nor any 
distant facility of client j is open, then j connects itself to the facility serving its cluster center j'. Again 
from Lemma 12.41 such an event happens with probability Ps < ^. We will now use the fact that if 7 < 2 
then, by Lemma 12.21 and Lemma 12.31 the expected distance from j to the facility opened around j' is at 

mostdf +4r''")+4?). 

Finally, we combine the probabilities of particular cases with the bounds on the expected connection 
for each of the cases, to obtain the following upper bound on the expected connection cost. 

E[cost{CONNj)] < p^.df+pd-df+ps-{df+df''''Hdf) 
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Figure 2: Facilities that j considers: close and distant, as well as close facilities of its cluster center j' . 
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The penultimate line above is due to the fact that < pj < 1. The total cost follows easily. Therefore, 
CS{j) is a I 7,max < 1 + 26""^, 2— j-^T— > ) bi-factor approximation for UFL (1 < 7 < 2.) Note that 
(7, 1 + 2e~^) is the bi-factor approximation lower bound [3 [6]. 



n 



Corollary 2.6 The CS(1.575) algorithm is an 1.57 5 -approximation algorithm for the UFL problem. 
Also, CS{'y) is an optimal bi-factor approximation for UFL for 70 < 7 < 2. 

2.2 Approaching an optimal approximation 

Our purely primal-based analysis of CS{'y) suggests a way to approach the optimal so-approximation. 
Our analysis shows that if the adversary knows the value of 7, then the adversary's optimal strategy - in 
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selecting a "bad" instance and corresponding LP solution (x*,y*) - is just to make one of two choices, as 
follows. For each j, 

• either all facilities i with x*- > are at the same distance from j, 

• or there exist < Oj < bj such that all facilities i with x*- > are at distance either Oj or bj from j, 
such that X)r c =a ^ij ^^ infinitesimally smaller than I/7. 

Given 7, the adversary will make the above choice in order to maximize the expected objective-function 
value after running CS{'y). (This optimization, as well as the corresponding choices for aj and bj, can be 
carried out explicitly in a straightforward manner.) 

However, what if we select 7 randomly from some appropriate distribution? If the adversary selects 
the second choice above, then a suitably "large" value of 7 will defeat the adversary's goal of making 
J2i: Ci=a- ^ij (infinitesimally) smaller than I/7, leading to a significantly-improved approximation. On the 
other hand, if the adversary selects the first choice above, then a choice of 7 close to sq will lead to an 
approximation close to sq. 

We are unable at the moment to carry out the above idea in an optimal manner: choosing the optimal 
distribution for 7, for instance. However, we view this as a potentially-fruitful approach since it starts with 
the knowledge of what exactly are the limits of deterministic strategies for choosing 7. 

3 Two-stage stochastic facility location 

We now consider two-stage stochastic facility location in the "explicitly-given polynomially-many scenarios" 
model presented in the introduction. The LP-rounding problem is thus to find integral solutions "close" 
to the optimal solution of the following LP, where the scenarios are indexed by the symbol A and their 
respective probabilities aregiven by the values pA'- 

minimize ^ fjyi + J2paCZ1 ftvA^i + X! X! Cij^A,ij) subject to 
ieT A i jeA i 

Y,XA,^J > 1 VAyjeA; 

i 

XA,ij < yi + yA,i Vi VA Vj g A; 
XA,ij,yi,yA,i > yi^AVj £ A. 

First, in Section [3. II we give an algorithm for the standard setting, then in Section [3.21 we consider the 
per-scenario version of the problem. 

3.1 General expected cost: analysis with a dual bound 

Consider the following dual formulation of the 2-stage stochastic facility location problem: 

maximize Z^P^^a-j ^i.^) subject to: 
A jdA 

Wij^A > Vj^A - Cij 

A j£A 

j&A 
Wij,A,Vj,A > 0. 



Let (x*,y*) and {v*,^!*) be optimal solutions to the primal and the dual programs, respectively. Note 
that by complementary slackness, we have Cij < Vj^a if ^A,ij > 0. 

Algorithm. We now describe a randomized LP-rounding algorithm that transforms the fractional so- 
lution {x*,y*) into an integral solution {x,y) with bounded expected cost. The expectation is over the 
random choices of the algorithm, but not over the random choice of the scenario. Note that we need to 
decide the first stage entries of y not knowing A. W.l.o.g. we assume that no facility is fi'actionally opened 
in (x*,y*) in both stages, i.e, for all i we have y* = or for all A y'^^ = 0. To obtain this property it 
suffices to have two identical copies of each facility, one for Stage I and one for Stage II. 

We start by scaling the fractional solution {x*,y*) by a factor of 2. As a result, we obtain a fractional 
solution {x,y) with XA,ij = "^ ' x*a ^j, yi = 2 ■ y^ , and y^^^ = 2 ■ y\-. Note that the scaled fractional solution 
{x,y) can have facilities with fractional opening of more then 1. For simplicity of the analysis, we do not 
round these facility-opening values to 1, but rather split such facilities. More precisely, we split each facility 
i with fractional opening y^ > XA,ij > (or y^j > xaaj > 0) for some {A,j) into i' and i", such that 
yj/ = XA,ij and yj// =yi — XA,ij- We also split facilities whose fractional opening exceeds one. By splitting 
facilities we create another instance of the problem, then we solve this modified instance and interpret the 
solution as a solution to the original problem in the natural way. The technique of splitting facilities is 
precisely described in [12] 



Define x^^- = ram{x a a j,yi}-, and x]^/- = XA,ij — Xa{j- Observe, that for a client-scenario pair {j,A) 

either J^ieT^Aij — f' ^^ J2ieT^Aij > ^- ^^ ^^^ former case, we call such a pair first stage served, and we 
denote the set of the first stage served pairs by S. 

Since we can split facilities, for each (j, ^) € S" we can assume that there exists a subset of facilities 
F'{j,A) ^ -^) such that J^ieF, ^, '^ah ~ f ' ^^^ ^°^ each i € Ff^j^A) we have Xa-j = IJi- Also for each (j, A) ^ S 
we can assume that there exists a subset of facilities Fu^a) ^ -^) such that YliieF, ■ 4^ ^a ij ~ ^i ^^^ f°^ each 

i G F(^j^A) we have Xaij = Va,!- Let R[j^a) = maxjgir Cij be a maximal distance from j to an z € -^(j,yi)- 
Recall that, by complementary slackness, we have R(j,a) ^ "^jA- 

The algorithm opens facilities randomly in each of the stages with the probability of opening facility i 
equal to yj in Stage I, and yA^i in Stage II of scenario A. Some facilities are grouped in disjoint clusters in 
order to correlate the opening of facilities from a single cluster. The clusters are formed in each stage by 
the following procedure. Let all facilities be initially unclustered. In Stage I, consider all client-scenario 
pairs (j. A) £ S (in Stage II of scenario A, consider all clients j such that (j. A) ^ S) in the order of 
non-decreasing values R(j,a) ■ If th^ set of facilities F/j^a) contains no facility from the previously formed 
clusters, then form a new cluster containing facilities from F/j^a)^ otherwise do nothing. In each stage, 
open exactly one facility in each cluster. Recall that the total fractional opening of facilities in each cluster 
equals 1. Within each cluster choose the facility randomly with the probability of opening facility i equal 
to the fractional opening y^ in Stage I, or y^ j in Stage II of scenario A. For each unclustered facility i 
open it independently with probability y^ in Stage I, and with probability y^ j in Stage II of scenario A. 
Finally, at the end of Stage II of scenario A, connect each client i £ A to the closest open facility. 

Analysis. Consider the solution (x, y) constructed by our LP-rounding algorithm. We fix scenario A 
and bound the expectation of COST{A) = Y^i^^Afiin + ff"yA,i) + llj&Alli&F^ij^A,ij- Define Ca = 
Y.jeA^ijA) ~ ^jeAJ2i£T (^ij^Xij^ Fa = YuieT\fi vt + ft y\i)^ ^A = Yjj^a'"*j,a- 

Lemma 3.1 E[COST{A)] < e^^ • 3 • Vk + (1 - e"^) ■Ca + 2-Fa in each scenario A. 

Proof: Since the probability of opening a facility is equal to its fractional opening in (x, y ) , the expected 
facility-opening cost of (x, y) equals facility-opening cost of {x,y), which is exactly twice the facility-opening 
cost of {x* , y*). 



Fix a client j G A. The total (from both stages) fractional opening in y of facilities serving j in {x,y) 
is exactly 2, hence the probability that at least one of these facilities is open in (x,y) is at least 1 — e~^. 
Observe that, on the condition that at least one such facility is open, by an analogous to the one from 
Lemma [231 the expected distance to the closest of the open facilities is at most C(^j^Ay 

With probability at most e~^, none of the facilities fractionally serving j in (x, y) is open. In such a 
case we need to find a different facility to serve j. We will now prove that for each client j (z A there exists 
a facility i which is open in {x, y), such that Cij < 3 • Vj^a- 

Assume {j,A) G S (for {j,A) ^ S the argument is analogous). If F(^j^ji-j is a cluster, then at least one 
i G F'(j,A) is open and Cij < vj^a- Suppose Fu^a) is not a cluster, then by the construction of clusters, 
it intersects a cluster Fqi^a') with R(^ji^a') < ^ij,A) — ^jA- ^^^ ^ be the facility opened in cluster F(^ji^a') 
and let i' G -f'o',A') 1^ ^(jA)- Since i' is in -^(j,a)) Cj/j < R(j^a)- Since both i and i' are in F(j/^^/), both 
Cij/ < R(^j/^A') and Cj/j/ < Rqi^a')- Hence, by triangle inequality, Cij < R(^j^a)~^'^'R{j',A') ^ '^'■^{jA) — '^'''^jA- 

Thus, the expected cost of the solution in scenario A is: 

E[COST{A)] < e-2 . 3 . ^ v,,a + (1 - e-2)(^ ^ c,,xX,^) + 2 • (Y^ifly* + /^J) 

jeA j£Ai&T i£T 

< e-2 • 3 • Va + (1 - e-2) ■Ca + 2-Fa. 

n 

Define F* = Y^iarflvi + J2APA{T,ifi^yA,i) and C* = T,APA{'EjeAT,iCijXA,ij)- Note that we have 

F* = J2aPaFa, C* = J2aPaCa, and F* + C* = J2aPaVa- Summing up the expected cost over scenarios 

we obtain the following estimate on the general expected cost, where the expectation is both on the choice 

of the scenario and on the random choices of our algorithm. 

Corollary 3.2 E[COST{x,y)] < 2.4061 • F* + 1.2707 • C* . 

Proof: 

E[COST{x,y)] = Y.PaE[COST{A)] 

A 

< J2PA{e~^-^-VA + {l-e~^)-CA + 2-FA) 

A 

= (i_e-2).C* + 2.F* + 3e-2(^p^VA) 

A 

= (l-e-2) •C* + 2-F* + 3e-2(F* + C*) 
= (2 + e-2 • 3)F* + (1 + e-2 • 2)C7* 

< 2.4061 •F* + 1.2707 -C*. 

D 

Combining tw^o algorithms. The above described algorithm can be combined with an algorithm 
from [IT] to obtain a 2.2975-approximation algorithm. See Appendix [A] for details. 

3.2 Per-scenario bounds: primal analysis 

Consider again the 2-stage facility location problem, and a corresponding optimal fractional solution. We 
now describe a randomized rounding scheme so that for each scenario A, its expected final (rounded) 



cost is at most 2.4061 times its fractional counterpart VqIa = J^ieJ^ifl Vt + fi^VA i) + SjeA J2ieT ^ij^*A iji 
improving on the 3.095 • VqIa bound of |llj . 

Note that we cannot use dual bounds in this setting, as the dual budgets Va do not have to equal ValA 
in each scenario (see Appendix [B] for an example). Instead, we scale the facility-opening values a little 
more and show that scaling by a factor of 2.4061 is sufficient to bound the expected connection cost in each 
scenario by 2.4061 times the fractional connection cost in this scenario. Details are given in Appendix iBl 

4 Robust fault-tolerant UFL 

In recent work, Chechik and Peleg have introduced a new variant of facility location problems [1]. They 
study a setting that can be described as follows. Once we choose the facilities to open, an adversary closes 
up to k of them (which models possible failures of facilities) , and then clients are connected to the closest 
of the remaining open facilities. The goal is to minimize the facility-opening cost plus the worst-case (over 
the choice of facilities to close) connection cost. Observe that integral solutions (x,y) of the following linear 
program are exactly the feasible solutions to the problem we study. 

minimize ^ fiin + max^ ^ CijXA,ij subject to (3) 

i£j^ j i 

Y.XA,ij > 1 V^Vi; (4) 

i 

XA,ij < Vi Vi V^ Vj; (5) 

XA,ij = Vyl Vf G A Vi; (6) 

XA,ij,yi,yA,i > Vi Vyl Vj. 

The "scenarios" A in the above program are all the subsets of facilities of cardinality k, and they encode 
the facilities closed by the adversary. Note that the connection cost is calculated as a maximum over the 
scenarios. The above program is of polynomial size only for fixed k and we will only study settings with 
such small k. 

In [3] Chechik and Peleg gave a 6.5-approximation algorithm for k = 1 and a (7.5A; + 1.5)-approximation 
algorithm for general k. We improve the latter to a (/c + 5 + 4/A;)-approximation algorithm for the studied 
problem. This we obtain by showing that scaling the facility opening variables by (/c + 5 + 4/A;) is sufficient 
to provide enough fractional opening, which is then rounded by a dependent rounding method descrikbed 
in [3] (Sections 3 and 4). More details are given in Appendix O 

We also briefly discuss an oblivious version of the problem, where the adversary does not know the ran- 
dom choices of the algorithm when deciding the facilities to close. In this setting we provide a randomized 
LP-rounding algorithm that delivers integral solutions of expected cost at most k + 1.5 times the cost of 
the initial fractional solution, (see Appendix O) 

We also show that these methods cannot be extended to obtain approximation ratios sublinear in k by 
providing instances with integrality gap that are arbitrarily close to k + 1. (see Appendix (Cj) 
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Appendix 
A Combining two algorithms for 2-stage stochastic facility location 

We have described an algorithm that returns solutions of expected cost at most 2.4061 • F* + 1.2707 • C*. 
Let us call this algorithm ALGl. 

In |11] . Srinivasan gave a different approximation algorithm for our 2-stage stochastic facility location 
problem. This algorithm also splits the client-scenario pairs into two groups, namely those to be connected 
in the first stage, and those that are left to be connected in the second stage. The decision is made by 
comparing fractional "first stage" connection of each pair with a certain threshold. Once the split is made, 
the obtained instances of the standard Uncapacitated Facility Location problem are solved with the JMS 
algorithm [7]. The threshold is chosen randomly from a distribution parametrized by a. For the choice of 
a parameter a = 0.2485 the resulting algorithm is shown in [11] to be a 2.369-approximation algorithm. 
It is easy to show that by setting a = 0.37 in the algorithm of [11] . we obtain an algorithm that returns 
solutions of expected cost of at most 2.24152F* + 2.8254C*. We will call this algorithm ALG2. 

Consider the algorithm ALG3, which tosses a coin that comes heads with probability p = 0.3396. If the 
coin comes heads, then ALGl is executed; if it comes tails ALG2 is used. The expected cost of the solution 
produced by ALG3 can be estimated as: F*(p- 2.4061 + {l-p)- 2.24152) -hC* (p- 1.2707 +{l-p) ■ 2.8254) < 
2.2975(C* + F*). Therefore, ALG3 is a 2.2975-approximation algorithm for the 2-stage stochastic facility 
location problem. 
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B 2-stage stochastic facility location with per-scenario bouds 

Let us first note that it is not possible to directly use the analysis from the previous setting in the per- 
scenario model. This is because the dual costs Va do not need to be equal VcIa = F^ + Ca in each scenario 
A. It is possible, for instance, that the fractional opening of a facility in the first stage is entirely paid form 
the dual budget of a single scenario, despite the fact that clients not active in this scenario benefit from 
the facility being open. This can be observed, e.g., in the following simple example. Consider two clients 
c^ and c^, and two facilities /^ and /^. All client facility distances are 1, except ci^2 = dist(c^,f^) = 3. 
Scenarios are: A^ = {c^} and A"^ = {c^}, and they occur with probability 1/2 each. The facility-opening 
costs are: // = 2, /g = e, f^ = /2 = 4 for both scenarios A. It is easy to see that the only optimal 
fractional solution is integral and it opens facility /^ in the first stage, and opens no more facilities in 
the second stage. Therefore, Val{A^) = Val{A'^) = 3. However, in the dual problem, client c^ has an 
advantage over c^ in the access to the cheaper facility /^, and therefore in no optimal dual solution client 
C2 will pay more then e for the opening of facility /^. In consequence, most of the cost of opening /^ is 
paid by the dual budget of scenario A^. Therefore, the dual budget V^i is strictly greater then the primal 
bound ValAi which we use as an estimate of the cost of the optimal solution in scenario A^. 

Bearing the above example in mind, we construct an LP-rounding algorithm that does not rely on the 
dual bound on the length of the created connections, we use a primal bound, which is obtained by scaling 
the opening variables a little more and using just a subset of fractionally connected facilities for each client 
in the process of creating clusters. Such a simple filtering technique, whose origins can be found in the 
work of Lin and Vitter [8], provides slightly weaker but entirely primal, per-scenario bounds. 

Algorithm. As before, we describe a randomized LP-rounding algorithm that transforms the fractional 
solution {x*,y*) into an integral solution {x,y) with bounded expected cost. The expectation is over the 
random choices of the algorithm, but not over the random choice of the scenario. 

We start by scaling the fractional solution {x*,y*) by a factor of 7 > 2. As a result, we obtain a 
fractional solution {x,y) with XA,ij = I'^AijiVi = I'Vii ^-nd y^ j = J-yAi- Note that the scaled fractional 
solution {x,y) may have facilities with fractional opening of more than 1. Again, for simplicity of the 
analysis, we do not round these facility-opening values to 1, but rather split such facilities. More precisely, 
we split each facility i with fractional opening y^ > XA,ij > (or y^ j > XA,ij > 0) for some {A, j) into i' 
and i" , such that y,j/ = XA,ij and y^n =yi — XA,ij- We also split facilities whose fractional opening exceeds 
one. 

As before, define x)^'^j_j = mm{x A,ij,yi}, and x]^Jj = XA,ij -^a,jj- Define 

( argmin ^ ^ (D maxi^F'Cij if Eie.F^A,L > 1 

\ argmin ,^ (u) maxi^F'Cij ifEie^^AJ]>l 

Note that these sets can easily be computed by considering facilities in an order of non-decreasing 
distances Cij to the considered client j. Since we can split facilities, w.l.o.g., for all j G C we assume that 

if FL jt^. is nonempty then J2iPF' '^An ~ ^' ^^'^ ^^ ^Ha) ^^ ^°* empty then J2iaF" '^Au ~ -*-• Define 
d(j,A) = ^(^Xi^pi^^Cij and (ig^) = max^^pu^Cij. Let d^j^A) = min{d\.^^yd\^.^^-^]. 

For a client-scenario pair (j, ^), if we have d(j,A) = ^fi A)' then we call such a pair first-stage clustered, 
and put its cluster candidate F(^ja) = FI- j^y Otherwise, if ii(j,yi) = ^{1 a) ^ ^[7 A)' ^^ ^^^ ihaX {j,A) is 
second-stage clustered and put its cluster candidate Fi^j^a) = -^//a) 
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Recall that we use C^^^a) = Si ^ij^A a ^° denote the fractional connection cost of client j in scenario 
A. Let us now argue that distances to facilities in cluster candidates are not too large. 

Lemma B.l d(j a) ^ ^732 ' ^(jA) f^''" ^^^ pairs {j,A). 

Proof: Fix a client-scenario pair {j,A). Assume -^(j,a) = ^d A) (^^^ other case is symmetric). Recall that 

in this case we have d(j,yi) = df- ^^ < d^/j^- Consider the following two subcases. 

Casel. E^eF^l^/l!i = l■ 

Observe that we have Qj > c^(j,a) for alH G F' = J^ \ (F/. ^^ U F//^n). Note also that X^ieF' ^A,jj =7 — 2 

and EieF' a^A.ij = ^- Hence, q^- a) = EiG.F^A,ijC*j > EieF' 3:1,*^% > ^ • d(j- a)- 

Case 2. X^jgf" ^Aji' ~ ^i which implies that J2i&T^Aij < -'^- Observe that now we have X]ieJ"^Vij > 

7 — 1, and therefore X]jej=-\F^ '^An > 7 — 2. Recall that Cjj > d{j,yi) for all i G {T \ F/. ^J, hence 

C(j,A) = J2ieT X*^,ijCij > J2ie{T\F^'.^^^) ^A.ijCij > ^p • t^(j,A)- n 

Like in Section 13.11 the algorithm opens facilities randomly in each of the stages with the probability 
of opening facility i equal to y^ in Stage I, and y^i j in Stage II of scenario A. Some facilities are grouped in 
disjoint clusters in order to correlate the opening of facilities from a single cluster. The clusters are formed 
in each stage by the following procedure. Let all facilities be initially unclustered. In Stage I, consider all 
first-stage clustered client-scenario pairs, i.e., pairs {j,A) such that d(j,A) = '^[7 a)- (™ Stage II of scenario 
A, consider all second-stage clustered client-scenario pairs) in the order of non-decreasing values d(j,A)- If 
the set of facilities F(j,a) contains no facility from the previously formed clusters, then form a new cluster 
containing facilities from FtjM, otherwise do nothing. In each stage, open exactly one facility in each 
cluster. Recall that the total fractional opening of facilities in each cluster equals 1. Within each cluster 
choose the facility randomly with the probability of opening facility i equal to the fractional opening y^ 
in Stage I, or y^ j in Stage II of scenario A. For each unclustered facility i open it independently with 
probability y^ in Stage I, and with probability y^^i in Stage II of scenario A. 

Finally, at the end of Stage II of scenario A, connect each client i € yl to the closest open facility. 

Analysis. The expected facility-opening cost is obviously 7 times the fractional opening cost. More 
precisely, the expected facility-opening cost in scenario A equals 7 • F^ = 7 • J2ieT flvi + ^i fiVA^i it 
remains to bound the expected connection cost in scenario A in terms of C\ = X^jeA Tli CijX^.ij- 

Lemma B.2 The expected connection cost in scenario A is at most (1 -|- ^^2 e~'^) • C(^j^a)- 

Proof: Consider a single client-scenario pair {j,A). Observe that the facilities fractionally connected to j 
in scenario A have the total fractional opening of 7 in the scaled facility-opening vector y. Since there is 
no positive correlation (only negative correlation in the disjoint clusters formed by the algorithm), with 
probability at least 1 — e~'^ at least one such facility will be opened, moreover, by Lemma [2.3l the expected 
distance to the closest of the open facilities from this set will be at most the fractional connection cost 

^UA)- 

Just like in the proof of Lemma 13.11 from the greedy construction of the clusters in each phase of the 

algorithm, with probability 1, there exists facility i opened by the algorithm such that Cij < 3 • c^(j,a)- We 

connect client j to facility i if no facility from facilities fractionally serving {j, A) was opened. We obtain 

that the expected connection cost of client j is at most (1 — e~"') ■ C(j^j^\ + e~"' ■ 3dfjj^y By Lemma lB.lt 

this can by bounded by (1 - e"^) • C *j,a +e"^ • 3 • ;^ • Cq,^) = (1 + ^e"'^) • C^^j^a) ° 

To equalize the opening and connection cost approximation ratios we solve (1 -|- -^^e~'^) = 7 and 

obtain the following. 
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Theorem B.3 The described algorithm with 7 = 2.4957 delivers solutions such that the expected cost in 
each scenario A is at most 2.4957 times the fractional cost in scenario A. 

C Robust fault-tolerant UFL 

C.l The {k + 5 + 4/A;)-approximation rounding routine 

Like in the algorithms in the previous sections we first scale up the fractional facility-opening costs, we 
then cluster certain facilities to correlate their opening, and then use a randomized rounding routine to 
decide the subset of facilities to open. Once we open facilities and the adversary chooses which k of them 
to close, clients get connected to the closest of the remaining open facilities. 

Let {x*,y*) be an optimal solution to the above LP relaxation of the problem. We first scale up the 
opening of facilities by 7 = /c + 5 + 4/A;, i.e., we set y^ = min{l, 7 • y*}. We also set XA,ij = min{l, 7 • x^ ■ ■} 

Consider a single client-scenario pair (j. A). Consider facilities i fractionally serving this pair in solution 
{x*,y*) in an order i^, i^, . . . of non-decreasing distance to Cij. Let i' be the first facility in this order such 
that x*^ ji • + x*^ j2,- + x\ j/ ■ > -^ = ^q:^. Recall that C'(j^yi) = J2i Cijx\^j denotes the fractional connection 
cost of client j in scenario A. By an argument analogous to the one in Lemma 12. 3^ we obtain that 

Ci'j < ^ ■ C(^j^j^y We now distinguish two cases. 

Case 1. There exists i among i^,i'^, . . . ,i' such that y^ = 1. Then facility i will be deterministically opened 
by the algorithm. Note that since x\^j > 0, we have i ^ A (i.e., facility i is not closed by the adversary in 
scenario A); hence, we can connect j to i in scenario A in our constructed integral solution. It remains to 
observe that Cij < ^ — - ■ Cij^a\ < (A; + 5 + 4/A;) • C(j^^) is a distance that we can accept. 

Case 2. There is no i among 1^,1"^, ... ,i' such that y^ = 1. Then we have x^^ij + x^^jz,- + • + XA,i'j > 
A; + 1, which is the fractional connection to at least k + 1 facilities, each of them within the distance of 

3fc ' ^UA)' With a randomized rounding technique described below, they will be turned into k + 1 

facilities opened within the distance of 3 • — — ^ — - ■ C(^j^a) = {k + 5 + 4/k) ■ C(j^^). Since at most k of 
these facilities will be closed by the adversary, there remains an open facility for client j in scenario A at 
distance at most (/c + 5 + 4//c) • Crj^j^y 

It remains to argue that we can turn k + 1 fractional connections to facilities at distance at most d into 
k + 1 integral connections to facilities at distance at most 3d. This can be seen as a situation typical for LP- 
rounding algorithms for the standard fault-tolerant facility location problem. Indeed, exactly this property 
is associated with the rounding scheme in [3] (Sections 3 and 4). It is obtained by carefully constructing 
a laminar family of subsets of facilities and performing a dependent rounding procedure guided by the 
subsets. It can also be thought of as an application of the pipage-rounding technique [T]. 

C.2 Better bound in the obhvious setting 

Let us now consider the oblivious setting where the k facilities to close/fail are chosen without the knowledge 
of our opening of facilities. In this setting we give a bound on the expected connection cost, where the 
expectation is over the random choices of the algorithm. More precisely, we will argue that the expected 
connection cost of client j in scenario A is bounded with respect to the fractional connection cost of j in 
scenario A. 

The difference with the previous setting is that now we can use the argument that after scaling the 
facility-opening variables by a constant 7, for a client j in scenario A, with probability at least 1 — e~'^, 
at least one facility from those fractionally serving (j. A) will be opened. Moreover, we can bound the 
expected distance to such facility by the fractional connection cost of {j, A) . This allows us to use those 
facilities that get opened with certainty (as described in Section [C.l P only with a certain small probability. 
In such a situation, it is beneficial to scale the facility-opening variables by a little less. 

The algorithm is like in Section [C.ip . only the scaling parameter 7 is smaller (say 7 = 1.5 + k), and 
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the analysis is different. We argue that for every chent j in each single scenario (choice of the k facilities 
to close) A the expected connection cost is bounded. As before we distinguish two cases. 
Case 1. (There exists i such that XA,ij = 1) If there is such facility at distance at most (1.5 + k)Cj^A we 
just connect to it, otherwise, the average connection cost in x is only smaller then the average connection 
cost in X* , and we may use a version of the Lemma 12.31 to argue that the expected connection cost to the 
closest of the facilities randomly opened by the algorithm is at most Cj_a- 

Case 2. (There is no such facility, and therefore there is no i among i^, i^, . . . , i' such that y^ = 1) Then we 
have x^^jij + x^^j2j H — \-XA,i'j > k + 1, which is the fractional connection to at least k + 1 facilities, each of 
them within the distance of (3 + 2A;) • Cq^a) ■ Just like in Section IC.lj) , we argue that as a result of dependent 
rounding we obtain k + 1 facilities deterministically opened within the distance 3(3 + 2k) ■ Cu^a) ■ And now 
we propose a suboptimal assignment procedure to bound the cost of the optimal one. In the suboptimal 
assignment, client first looks at facilities fractionally serving him. If one of them is opened then connect 
to the closest one, which would incur an expected cost of C(,-^), and if non is open, then take a facility 
deterministically opened within distance 3(3 + 2k) ■ Cua)- Like for the other results in this paper, we then 
argue that the expected connection cost is at most (1 — e"^"'^'^"'''^^) • C(^j^a) + e"^"'^'^"'''^^ • 3(3 + 2k) ■ Ci^j^a)-, 
which is less then (1.5 + k) ■ Cfj^A) foi' k >2. 

C.3 Integrality gap example 

Let us now show that the program ([3])- ([6]) has integrality gap at least k + 1 — e. Consider the following 
instance. There is a single client and n identical facilities. All the facility-opening costs are 1, and all the 
connection costs are 0. The optimal fractional solution opens each facility to the extent of ;^r^) incurring 
cost ^^^ -^ 1. Any integral solution, however, needs to open at least k + 1 facilities and therefore has 
cost at least k + 1. Therefore, for any a < k + 1 there exists an instance of the /c-robust fault tolerant 
problem, for which the integrality gap of the program ([3])- ([6]) at least a. 
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